Visual object tracking is one of the important tasks in computer vision, in order to achieve high-performance object tracking, a large number of object tracking methods have been proposed in recent years. Among them, Transformer-based object tracking methods become a hot topic in the field of visual object tracking due to their ability to perform global modeling and capture contextual information. Firstly, existing Transformer-based visual object tracking methods were classified based on their network structures, an overview of the underlying principles and key techniques for model improvement were expounded, and the advantages and disadvantages of different network structures were also summarized. Then, the experimental results of the Transformer-based visual object tracking methods on public datasets were compared to analyze the impact of network structure on performance. in which MixViT-L (ConvMAE) achieved tracking success rates of 73.3% and 86.1% on LaSOT and TrackingNet, respectively, proving that the object tracking methods based on pure Transformer two-stage architecture have better performance and broader development prospects. Finally, the limitations of these methods, such as complex network structure, large number of parameters, high training requirements, and difficulty in deploying on edge devices, were summarized, and the future research focus was outlooked, by combining model compression, self-supervised learning, and Transformer interpretability analysis, more kinds of feasible solutions for Transformer-based visual target tracking could be presented.
In the field of open social text, the generated text content lacks personalized features. In order to solve the problem, a user-level fine-grained control generation model was proposed, namely PTG-GPT2-Chinese (Personalized Text Generation Generative Pre-trained Transformer 2-Chinese). In the proposed model, on the basis of the GPT2 (Generative Pre-trained Transformer 2.0) structure, an Encoder-Decoder model framework was designed. First, the static personalized information of a user was modeled and encoded on the Encoder side, a bidirectional independent attention module was added on the Decoder side to receive the static personalized feature vector, and the attention module in the original GPT2 structure was used for capturing the dynamic personalized features in the user’s text. Then, the scores of different attention modules were weighted and fused dynamically, and were participated in the subsequent decoding, thereby automatically generating social text constrained by the user’s personalized feature attributes. However, the semantic sparsity of the user’s basic information may cause conflicts between the generated text and some personalized features. Aiming at this problem, the BERT (Bidirectional Encoder Representations from Transformers) model was used to perform the secondary enhanced generation of consistent understanding between the output data of the Decoder side and the user’s personalized features, and finally the personalized social text generation was realized. Experimental results show that compared with the GPT2 model, the proposed model has the fluency improved by 0.36% to 0.72%, and on the basis of no loss of language fluency, the secondary generation makes the two evaluation indicators: personalization and consistency increase by 10.27% and 13.24% respectively. It is proved that the proposed model can assist user’s creation effectively and generate social text that is fluent and personalized for the user.
The existing similarity-based moving target trajectory prediction algorithms are generally classified according to the spatial-temporal characteristics of the data, and the characteristics of the algorithms themselves cannot be reflected. Therefore, a classification method based on algorithm characteristics was proposed. The calculation of the distances between two points is required for the trajectory similarity algorithms to carry out the subsequent calculations, however, the commonly used Euclidean Distance (ED) is only applicable to the problem of moving targets in a small region. A method of similarity calculation using geodetic distance instead of ED was proposed for the trajectory prediction of sea targets moving in a large region. Firstly, the trajectory data were preprocessed and segmented. Then, the discrete Fréchet Distance (FD) was adopted as similarity measure. Finally, synthetic and real data were used to test. Experimental results indicate that when sea targets move in a large region, the ED-based algorithm may gain incorrect prediction results, while the geodetic distance-based algorithm can output correct trajectory prediction.
For the given multiple sequences, a certain threshold and the gap constraints, the study objective is to discover frequent patterns whose supports in multiple sequences are no less than the given threshold value, where any two successive elements of pattern fulfill the user-specified gap constraints, and any two occurrences of a pattern in a given sequence meet the one-off condition. To solve this problem, the existing algorithms only consider the first occurrence of each character of a pattern when they compute the support of a pattern in a given sequence, so that many frequent patterns are not mined. An efficient mining algorithm of multiple sequential patterns with gap constraints, named MMSP, was proposed. Firstly, it stored the candidate positions of a pattern using two-dimensional table, then it selected the position from the candidate positions according to the left-most strategy. The experiments were conducted on DNA sequences. The number of frequent patterns mined by MMSP was 3.23 times of that mined by the related algorithm named M-OneOffMine when the number of multiple sequence elements is constant and the sequence length changes, and the average number of mining patterns by MMSP was 4.11 times of that mined by M-OneOffMine when the number of multiple sequence elements changes. The average number of mined patterns by MMSP was 2.21 and 5.24 times of that mined by M-OneOffMine and MPP respectively when the number of multiple sequence elements changes, and the frequent patterns mined by M-OneOffMine was a subset of MMSP. The experimental results show that MMSP can mine more frequent patterns with shorter time, and it is more suitable for practical applications.